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Abstract 

The tragedy of the digital commons does not prevent the copious 
voluntary production of content that one witnesses in the web. We 
show through an analysis of a massive data set from YouTube that 
the productivity exhibited in crowdsourcing exhibits a strong posi- 
tive dependence on attention, measured by the number of downloads. 
Conversely, a lack of attention leads to a decrease in the number of 
videos uploaded and the consequent drop in productivity, which in 
many cases asymptotes to no uploads whatsoever. Moreover, upload- 
ers compare themselves to others when having low productivity and 
to themselves when exceeding a threshold. 
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We are witnessing an inversion of the traditional way by which content 
has been generated and consumed over the centuries. From photography to 
news and encyclopedic knowledge, the centuries-old pattern has been one in 
which a relatively few people and organizations produce content and most 
people consume it. With the advent of the web and the ease with which 
one can migrate content to it, that pattern has reversed, leading to a sit- 
uation whereby millions create content in the form of blogs, news, videos, 
music, etc. and relatively few can attend to it all. This phenomenon, which 
goes under the name of crowdsourcing, is exemplified by websites such as 
Digg, Flicker, YouTube, and Wikipedia, where content creation without 
the traditional quality filters manages to produce sought out movies, news 
and even knowledge that rivals the best encyclopedias. That such content 
is valued is confirmed by the fact that access to these sites accounts for a 
sizable percentage of internet traffic. For example, as of June, 2007 YouTube 
alone comprised approximately 20% of all HTTP traffic, or nearly 10% of all 
traffic on the Internet p]. 

What makes crowdsourcing both interesting and puzzling is the under- 
lying dilemma facing every contributor, which is best exemplified by the 
well-known tragedy of the commons. In such dilemmas, a group of people 
attempts to provide a common good in the absence of a central authority. In 
the case of crowdsourcing, the common good is in the form or videos, music, 
or encyclopedic knowledge that can be freely accessed by anyone. Further- 
more, the good has jointness of supply, which means that its consumption 
by others does not affect the amounts that other users can use. And since 
it is nearly impossible to exclude non contributors from using the common 
good, it is rational for individuals not to upload content and free ride on 
the production of others. The dilemma ensues when every individual can 
reason this way and free ride on the efforts of others, making everyone worse 
off — thus the tragedy of the digital commons IH El [71 O [TO] . 

And yet paradoxically, there is ample evidence that while the ratio of 
contributions to downloads is indeed small, the growth in content provision 
persists at levels that are hard to understand if analyzed from a public goods 
point of view. One possible explanation for this puzzling behavior, which 
we explore in this paper, is that those contributing to the digital commons 
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perceive it as a private good, in which payment for their efforts is in the form 
of the attention that their content gathers in the form of media downloads 
or news chcked on. As it has been shown, attention is such a valued resource 
that people are often willing to forsake financial gain to obtain it [6j. In 
the world of academia, for example, attention is often its main currency, for 
we publish to get the attention of others, we cite so that other researcher's 
work get attention, and we cherish the prominence of great work if only 
because of the attention it gathers [1] . Similarly, within online communities, 
status and recognition have been shown to be very important motivators for 
contributing 

If attention is indeed the main driver of contributions to the digital com- 
mons, one should be able to observe a correlation between the rate at which 
content is generated and the number of downloads. And if in addition a 
causal relation between the two does exist, we expect that those contributors 
that have a high level of downloads will continue to contribute, whereas those 
who see a decline in the attention that their content is receiving will decrease 
their productivity. 

In order to investigate this conjecture we collected data from YouTube, a 
popular website that allows its users to upload, view, and share video clips. 
After a YouTube user uploads a video, a "view count" number is immediately 
displayed next to the video title, which measures how many times it has been 
watched. Our dataset contained 9,896,816 videos submitted by 579,471 users 
by April 30, 2008. For each video upload we obtained its datestamp, the 
uploader's id, and the final view count. 

To study the dynamic interplay between productivity and attention, we 
partitioned time into 2-week periods, starting when they upload their first 
video and ending when they upload their last one. A common pattern we 
observed is that most periods between a contributor's first and last uploads 
contain no uploads at all (on average, 66% of these periods are empty), in- 
dicating an intermittent productivity. Because of the bursty nature of our 

^Another important instance is open source software development. Several studies 
have shown however, that open source projects are characterized by a very small core of 
contributors where the free-riding problem is not acute. 
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data, we considered only the "active" periods for each contributor (i.e. peri- 
ods containing at least one upload), and labeled them as t = 1, 2, . . . . 

We measured the productivity of each contributor by the number of videos 
rit she uploads during the t'th active period, and the attention she receives by 
the average number of views Vt of the rit videos. In other words we wanted to 
estabhsh how Vt affects nt+i,nt+2, ■ ■ ■ , which provides dynamical information 
on how each contributor responds to different amounts of attention. 

We conducted a robust linear regression {rit+ij^i ~ ajlog^Q ff}^^ + P 
for each contributor that was active for T > 10 periods |12j. (Because the 
view counts varied over many orders of magnitudes, it made sense to consider 
logj^o"^* instead of Vt.) We thus collected 76,462 a values and conducted a t- 
test of the null hypothesis that the a values come from a normal distribution 
with non-positive mean. The resulting p-value is less than 0.001, suggesting 
that the null hypothesis can be rejected. We also conducted the same test 
with different choices of T, and observed that as long as T > 10 the p-value 
was always less than 0.001. Hence, for those contributors who were active for 
a minimum number of periods, the more views they received in one period, 
the more videos they uploaded during the following period. 

A more direct approach to test our conjecture is to measure the change 
in each contributor's productivity at different attention levels. For each 
contributor who was active for at least two different periods, define v = 
mediaia.{vt}JSi as her median received attention, where T is her number of 
active periods. According to this definition, all periods can be divided into 
two groups of equal size, [(T — 1)/2J: the "good periods" in which she re- 
ceives higher than usual attention {G = {s : Vg > v}), and the "bad periods" 
in which she receives lower than usual attention {B = {s : Vg < v}). 

Let n*^ = |^(-y_\^y2j ^seG ^s+i denote the average productivity following a 
good period, and let = |^(y_\)^2j XIsgb ^s+i denote the average productiv- 
ity following a bad period. With these definitions the difference A = — n^ 
measures the change of a contributor's productivity between different atten- 
tion levels. If A > contributors upload more videos after obtaining more 
views, and if A < the opposite is true. 
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Figure [T] shows the histogram of the different 20,061 A values for the 
group of contributors who were active for 2 to 9 periods. A t-test of the null 
hypothesis that A < yields a p-value less than 0.001, leading to rejection 
of the null hypothesis. Thus on average each contributor becomes more 
productive after a good period than a bad period. 
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Figure 1: Histogram of contributor's A values for contributors that were 
active from 2 to 9 weeks. Notice that the maximum of the histogram is 
shifted to the right of the origin. The null hypothesis that data comes 
form a normal distribution with non-positive mean, can be rejected with 
p- value less than 0.001. 

Figure [T] indicates that each contributor tends to become more productive 
after receiving a number of views that exceeds her own normal performance. 
One can also test whether his productivity increases as she outperforms the 
average contributor in the general population. To do so, we measured the 
average view count of all videos in our dataset, which is given hj v = 10000, 
and used it to measure the productivity difference between good periods 
(more than 10000 views on average) and bad periods (less than 10000 views 
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on average) through the quantity A = n'^ — n^. We divided the contributors 
into several different groups depending on their number of active periods, 
and tested the null hypothesis "A < 0" for each group. Table [T] shows the 
results from these tests, including the number of contributors considered in 
each subgroup, the mean of the A values, and the p- values of the null hypoth- 
esis. Notice that the p-values are very small for most groups, which supports 
our hypothesis that a competitive factor enters into the productivity of con- 
tributors. Also note in Table [T] that the mean of A decreases as the number 
of active weeks increases, indicating that those people who made relatively 
few contributions care more about their relative performance against other 
contributors. 

For comparison purposes we also tested the same null hypothesis for v = 
median{ff}f~^ (i.e. the median view count of each contributor) which is not 
constant but varies from contributor to contributor. The results are listed in 
Table [2l We see that in this case the mean of A increases as the number of 
active weeks increases, indicating that the productive ones care more about 
how they have improved their own performance, rather than comparing with 
the rest of the community. 



number of active weeks 


number of contributors 


A-mean 


p- value 


2-9 


20061 


.59 


< .001 


10-19 


24517 


.58 


< .001 


20-29 


7789 


.32 


< .001 


30-39 


2153 


.09 


.11 


40-70 


515 


-.05 


.61 



Table 1: Tests of the null hypothesis "A < 0", where A = n'^ — mea- 
sures the productivity difference between a contributor's good periods 
(in which her contributions received more than 10000 views on average) 
and bad periods (less than 10000 views on average). As the number of 
active weeks increases, the mean of A decreases. 



While the observed correlations between attention and productivity sug- 
gest a trend, they do not imply a causal relation between them. In fact, it is 
not clear whether an increase in attention causes productivity as a whole to 
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number of active weeks number of contributors A— mean p- value 



2-9 
10-19 
20-29 
30-39 
40-70 



85949 
68317 
14757 
3303 
673 



01 
15 
18 
20 

26 



.14 

< .001 

< .001 

< .001 

< .01 



Table 2: Tests of the null hypothesis "A < 0", where A = n'^ — mea- 
sures the productivity difference between a contributor's good periods 
(in which her contributions received more than her median view count) 
and bad periods (less than her median view count). As the number of 
active weeks increases the mean of A increases. 

grow or vice-versa. In order to clarify this issue we used a Granger causality 
test, which is a statistical tool that determines causality in terms of predic- 
tion accuracy |8] . Given two signals Xi and X2 , we say that Xi G-causes X2 
if past values of Xi contain information that helps predict future values of 
X2. It is important to note that Granger causality is only meaningful if only 
found in one direction, i.e. Xi G-causes X2 but X2 does not G-cause Xi. If 
on the other hand Granger causality is found in both directions it is likely 
that Xi and X2 are only correlated and that the correlation is caused by a 
third signal. 

In order to determine the causal relation between attention and produc- 
tivity, we defined Vt to be the average of the all contributor's views during 
their t'th active period, and similarly we let nt be the average of all contrib- 
utor's videos uploads during their t'th active week. We then conducted a 
Granger causality test of the hypothesis that vt G-causes nt, which resulted 
in a p-value of 0.01, and of the hypothesis that fit G-causes Vt, which gave a 
p-value of 0.61. This result shows that attention plays a determinant role in 
the productivity of those uploading videos. 

Finally, since it is a common observation that many contributors stop 
uploading videos, we decided to test if this behavior was due to the small 
number of downloads their videos receive. To do so we considered all the 
contributors in our dataset that had not uploaded any videos during the four 



7 



3400 




■1400 I ' ' ' 1 ' 1 ' ' ' 1 

O 10 ZO 30 40 50 60 70 80 90 100 

i^*^ to last video 

Figure 2: Average number of views vs. f'th to last video. The origin rep- 
resents the last video. The average number of views decreases linearly 
as contributers approach their last video with correlation of 0.90. 

months previous to the date the data was collected. 

Figure [2] shows the number of average views as a function of the i'th to 
last video. As can be seen, as contributors approach their last video upload 
at the origin, the average number of previous views of their videos exhibited 
a marked linear decrease. This confirms our conjecture that decreasing at- 
tention leads to a lack of productivity, in this case to the point of making 
contributors stop uploading any videos. 

In summary, by analyzing a massive data set from YouTube we have 
shown that the productivity exhibited in crowdsourcing exhibits a strong 
positive dependence on attention. Conversely, a lack of attention leads to 
a decrease in the number of videos uploaded and the consequent drop in 
productivity, which in many cases asymptotes to no uploads whatsoever. 
Moreover, we were able to determine that uploaders compare themselves to 
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others when having low productivity and to themselves when exceeding a 
personal threshold. More generally, these results show that the tragedy of 
the digital commons is partly overcome by making the uploading of digital 
content a private good paid for by attention. 
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